% chapter01.tex

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %                                                                           %
 %    YABBY documentation                                                    %
 %    Copyright (C) 2007 Vladimir Likic                                      %
 %                                                                           %
 %    The files in this directory provided under the Creative Commons        %
 %    Attribution-NonCommercial-NoDerivs 2.1 Australia license               %
 %    http://creativecommons.org/licenses/by-nc-nd/2.1/au/                   %
 %    See the file license.txt                                               %
 %                                                                           %
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\setcounter{section}{0}

\chapter{Basics}

\section{Introduction}

Many bioinformatics tasks are solved by writing of specialised programs,
typically using scripting languages which allow rapid development and are
supported by well developed libraries. When the level of generality is
required, or when such programs manipulate large data files, testing and
debugging may involve considerable effort and time. In spite of their
inherent value, such programs are typically scattered with project-specific
data files, and are often lost after the completion of the project as a
result of data, interest, or people migration.

Yabby aims to support a long term retention of dissipate Perl program
scripts developed for bioinformatics. Yabby is a framework that consists
of an interactive shell and a library of program scripts that provide
a specific functionality. The shell provides the command line interface
to the user, and a mechanism to execute library program scripts. Program
scripts can have either stand-alone functionality, or they can be a part
of a larger processing pipeline. In the latter case, they exchange the
data through the common data model. For this purpose two different data
formats are used, two-dimensional tables and XML. Yabby is modular and
extensible, and allows users to modify existing functionality and to add
new functionality easily.

Yabby is released as open source, under the GNU Public License
version 2.

\section{Windows Installation}

If Cygwin is installed, the UNIX instructions above should be used. Both
perl and subversion can be obtained through Cygwin's
setup.exe, from any server.

\subsection{Installing Perl}

Yabby has been developed with Perl version 5.X, a freely available general
purpose scripting language. Perl stands for "Practical Extraction and Report
Language" and is particularly well suited for the manipulation of data stored
in plain text files. This seems all too often to be required in bioinformatics
applications.

The perl interpreter must be installed. There are a couple of ways to do this
but the most simple is to install ActiveState's ActivePerl.
\begin{verbatim}
http://www.activestate.com/Products/activeperl/
\end{verbatim}
At this site one can obtain a free copy of the ActivePerl installer, which
will then need to executed to install the interpreter.

\subsubsection{Installing required ActivePerl packages}

Yabby makes use of Perl's XML libraries in some of its data storage. These can
be acquired through ActivePerl's Package Manager:
\begin{description}
  \item[1] run start->programs->activeperl->perl package manager
  \item[2] press CTRL+1 to view all avaliable packages
  \item[3] scroll down to select XML-DOM and press '+' to mark for install.
  \item[4] press CTRL+Enter to install the XML package.
\end{description}

The package manager might automatically install some other XML packages too. This is OK.


\subsection{Installing a subversion client}

Google servers maintain the Yabby source code by the program called
'subversion' (an open-source version control system). To download the source
code one needs to use a subversion client program. The book about subversion
is freely available on-line at http://svnbook.red-bean.com/. Subversion has
extensive functionality however only the very basic functionality is needed to
download Yabby. A powerful and easy-to-use interface to subversion for Windows
is TortoiseSVN. It is an extension of the shell, so it can be (and must be)
used straight from Windows Explorer. It is available for free download here:

\ http://tortoisesvn.net/downloads

Run the .msi executable, and it will install. Unfortunately, {\underline{it
will require a reboot.}}

\subsection{Downloading Yabby}

Yabby source code can be browsed from the Google Code servers, at the URL:
http://code.google.com/p/yabby/. Under the section "Source" one can find the
instructions for downloading the source code. The same page provides the link
under "This project's Subversion repository can be viewed in your web browser"
which allows one to browse the source code on the server without actually
downloading it.

The following instructions refer specifically to TortoiseSVN. It is
recommended, but not necessary, that a new directory be created anywhere in
the filesystem where the Yabby source code is to be located. For example:
C:{$\backslash$}
\begin{description}
  \item[1] create C:{$\backslash$}. Then open this directory.
  
  \item[2] Right-click the empty space and select SVN Checkout.
  
  \item[3] The URL is http://yabby.googlecode.com/svn/trunk/. Click OK.
\end{description}

\subsection{Yabby installation}

Yabby installation requires that the file 'yabby/lib/yabby.pl' is modified to
to set the path to Yabby libraries.

If Yabby code was downloaded in the directory C:{$\backslash$}, to set
the path to Yabby libraries the following two lines need to be set (yabby.pl
is stored in binary format, not text. So use Wordpad.exe, found in
{\bf{Start Menu$\longrightarrow$All Programs$\rightarrow$Accessories}):
\begin{verbatim}
use lib "C:\\Yabby\\lib";
$LIB_DIR = "C:\\Yabby\\lib";
\end{verbatim}
NOTE: `{$\backslash$}' is an escape character in Perl strings, which is
also the character Windows uses to seperate directories. A double escape
character escapes the escape. Hence the need for two.

If you double-click yabby.pl now, it will start Yabby.
\begin{verbatim}
 - YABBY version 0.1 - 
   Copyright (c) 2004-7 Vladimir Likic
 [ 38 command(s) ready ]

yabby>
\end{verbatim}


\section{Running Yabby}

\subsection{An interactive session}

Yabby can be run interactively or from the command script. To
start an Yabby interactive session one needs to start the Yabby
interface from the Unix shell. Here is the simplest Yabby session:

\begin{verbatim}
$ yabby

 - YABBY version 0.1 - 

 [ 35 command(s) ready ]

yabby> quit

 bye-bye
\end{verbatim}

\subsection{Yabby command scripts}

In order to run Yabby from the command script, the command file
needs to be prepared first. Such a file lists Yabby commands
one per line, with optional blank lines (lines which start with
the \% character are ignored). For example, the following input
file, named test.yab,

\begin{verbatim}
% test.yab -- test input script

seq_load cad3.seq cad3
seq_info -l cad3
\end{verbatim}

could be run with the Unix shell input redirection:

\begin{verbatim}
$ yabby < test.yab

 - YABBY version 0.1 - 

 [ 35 command(s) ready ]

yabby> % test.yab -- test input script
yabby> yabby> 
 Reading the file 'cad3.seq' ..
 3 sequence(s) found.

yabby> 
 'cad3' contains 3 sequence(s)
  1 -> Q53650_STAAU, 192 residues
  2 -> Q97PJ0_STRPN, 193 residues
  3 -> P95773_STALU, 192 residues

yabby> yabby> 
 bye-bye
\end{verbatim}

When run from the command script the actual commands are not
echoed back, only the command's screen output as well as the
comments are seen.

\subsection{Executing Unix commands}

\index{unix commands}

Any command which is not recognized by Yabby is assumed to be
an Unix command, and Yabby will attempt to execute it. Consider
the following example:

\begin{verbatim}
yabby> ls
1BT0.pdb  cox1.seq     LmjFmockup.pep  needle.out
cad3.seq  dna.seq      m2.blocks       README
cad.seq   hmmpfam.out  meme.out        test.yab
yabby> l

 [ UNIX command 'l' failed ]
\end{verbatim}

Because there is no Yabby command {\tt ls}, it was assumed to
be a system command and executed.  The output was printed on
the screen, listing the files and directories in the directory
where Yabby was started.

Subsequently, the command {\tt l} was given but failed because
there is no such Yabby or Unix command. If {\tt l} was an alias
to something (say {\tt ls -CF}) the command would fail regardless,
because Yabby does not know about shell aliases.

There are no inherent limitations to which Unix commands can be
executed within Yabby. It is possible to run a text editor, such
as "vi" (and then simply resume the Yabby session after exiting
the editor), or even start programs with GUI such as "gnuplot",
or a Unix terminal window.

A subtle but important point is that Unix commands are not executed
through the Unix shell.  The consequence of this is that the
(sometimes important) functions provided by the Unix shell, such
as file globing, are not available. For example:

\begin{verbatim}
yabby> ls *
ls: *: No such file or directory

 [ UNIX command 'ls *' failed ]
\end{verbatim}

A handy trick which allows one to go about Unix business is
to temporarily suspend Yabby. This is actually a feature provided
by some unix shells (in combination with the system's terminal driver),
and has little to do with Yabby. In short, typing Ctrl-Z within
Yabby will suspend the current Yabby session, and return user
to the Unix shell. Issuing the command "fg" to the same shell
will return the suspended Yabby session:

\begin{verbatim}
yabby> [Cntrl-Z]
[1]+  Stopped                 yabby
$ ls -CF
1BT0.pdb  cox1.seq     LmjFmockup.pep  needle.out
cad3.seq  dna.seq      m2.blocks       README
cad.seq   hmmpfam.out  meme.out        test.yab
$ fg

yabby>
\end{verbatim}

Ctrl-Z was typed on the first line, which was not echoed back. Note
that the second command prompt (starting with the {\tt \$} character)
is the Unix shell prompt, and typing "fg" had returned user to
the suspended Yabby session.

\section{Understanding how Yabby works}

To understand how Yabby works it is important to understand the
relationship between three directories: the working directory
(where the Yabby session has been started), the Yabby library (this
is the lib/ subdirectory in the the Yabby installation directory),
and the workspace directory.

The workspace directory is created automatically when the Yabby
session is initialized. By default, the workspace directory is
called .yabby, and it is created in the working directory. The
workspace directory contains all the yabby objects created within
the session. Upon a graceful exit from Yabby the workspace directory
is destroyed together with its content.

\subsection{The exchange of data between command scripts}

Consider what happens when the sequences were read from the file
'tom20.fas' to create a sequence object in the workspace:

\begin{verbatim}
1  $ yabby
2 
3   - YABBY version 0.1 -

5   [ 35 command(s) ready ]
6 
7  yabby> seq_load tom20.fas tom20
8  
9   Reading the file 'tom20.fas' ..
10  3 sequence(s) found.
11 
12 yabby> what
13 
14     object(s)      type
15   ------------------------------
16     tom20          seq   
17 
18 yabby> print tom20.seq
19
20 >A.thaliana2 [ A.thaliana2 ]
21 MEFSTADFERFIMFEHARKNSEAQYKNDPLDSENLLKWGGALLELSQFQPIPEAKLMLND
22 AISKLEEALTINPGKHQALWCIANAYTAHAFYVHDPEEAKEHFDKATEYFQRAENEDPGN
23 DTYRKSLDSSLKAPELHMQFMNQGMGQQILGGGGGGGGGGMASSNVSQSSKKKKRNTEFT
24 YDVCGWIILACGIVAWVGMAKSLGPPPPAR
25 >O.sativa [ O.sativa ]
26 MDMGAMSDPERMFFFDLACQNAKVTYEQNPHDADNLARWGGALLELSQMRNGPESLKCLE
27 DAESKLEEALKIDPMKADALWCLGNAQTSHGFFTSDTVKANEFFEKATQCFQKAVDVEPA
28 NDLYRKSLDLSSKAPELHMEIHRQMASQASQAASSTSNTRQSRKKKKDSDFWYDVFGWVV
29 LGVGMVVWVGLAKSNAPPQAPR
30 >L.esculentum [ L.esculentum ]
31 MDMQSDFDRLLFFEHARKTAETTYATDPLDAENLTRWAGALLELSQFQSVSESKKMISDA
32 ISKLEEALEVNPQKHDAIWCLGNAYTSHGFLNPDEDEAKIFFDKAAQCFQQAVDADPENE
33 LYQKSFEVSSKTSELHAQIHKQGPLQQAMGPGPSTTTSSTKGAKKKSSDLKYDVFGWVIL
34 AVGLVAWIGFAKSNMPXPAHPLPR
\end{verbatim}

In line 1 the Yabby was started from the Unix shell command line.
During the initalization process the workspace directory .yabby
was silently created. The command shown on line 7:

\begin{verbatim}
seq_load tom20.fas tom20
\end{verbatim}

will read the sequence from the file 'tom20.fas', and save the sequences
under the name 'tom20'. The command 'what' shown on the line 12 allows
one to inspect the content of the workspace. It shows that the workspace
contains one object, named 'tom20', and this object is of the type 'seq'
(it is a sequence object).  The command 'print' on the line 18 has
printed the object 'tom20' on the terminal screen.

The command 'seq\_load' has executed the script 'seq\_load.pl'
located in the Yabby library. This command reads the FASTA sequence
file and converts it into an Yabby object of the type 'seq'. The
command 'print' has executed the script 'print.pl' from the Yabby
library. The scripts 'seq\_load.pl' and 'print.pl' are completely
independent.

How was the script 'print.pl' aware of the data object 'tom20.seq',
and how was the data transferred between the script 'seq\_load.pl'
to the script 'print.pl' for printing?

This was achieved by the use of the workspace directory. When Yabby
was started (line 1) the workspace directory .yabby/ was silently
created in the current working directory. The newly created object
'tom20' (the result of the command 'seq\_load') was stored in the
file 'tom20.seq' in the workspace directory. On the line 18, the
command 'print' was requested to print the object 'tom20' of the
type 'seq'. The command 'print' has attempted to find the object
'tom20.seq' in the workspace. This was successful, and the object
was loaded from the workspace directory, and then printed on the
terminal screen.

Consider printing a non-existent sequence object:

\begin{verbatim}
yabby> print tom99.seq

 _print_seq:: missing requirement: 'tom99.seq'

 print:: ERROR: system script '_print_seq' failed
 [ error occurred in the subroutine call() ]
 [ command 'print' failed ]
\end{verbatim}

The message above shows that 'tom99.seq' ('tom99' object of the 
type 'seq') is not in the workspace. Furthermore, printing of the
sequence objects is handled by the system script '\_print\_seq.pl',
which is normally invoked by the command script 'print.pl'. Hence
the error message comes from '\_print\_seq'.

Consider printing of an object of a non-exiting type:

\begin{verbatim}
yabby> print tom20.ttt

 print:: ERROR: printing the property 'ttt' not yet implemented
 [ command 'print' failed ]
\end{verbatim}

In this case the error message originated from the command script
'print.pl'. This command script needs to decide which system script
to call for printing. It first determines if the printing of the
requested object type has been implemented, and bombs out if not.

The relationship between the working directory, workspace, and Yabby
library is shown in Figure \ref{fig:session}.

\begin{figure}
\centering
\includegraphics[height=6.0cm]{pics/yabby-session.eps}
\caption{The relationship between the working directory, Yabby
library, and the workspace directory. In this example the Yabby
session was started in the user's home directory (/home/jake).
During the initialization, Yabby has created the workspace directory
(/home/jake/.yabby). The execution of the command seq\_load from
the Yabby shell has executed the script seq\_load.pl from the
library, and has created the tom20.seq object. This object is
stored in the file tom20.seq, created in the workspace directory.}
\label{fig:session}
\end{figure}

\subsection{The workspace}

The workspace is the area where persistent objects are held,
normally an area of the computer memory. In Yabby, the workspace
is an area of the local persistent storage (hard disk), and more
specifically, the workspace directory (or folder). This directory
is automatically created when Yabby is started, and is silently
destroyed upon a graceful exit from Yabby. The consequence of
this is that one can directly inspect what is in Yabby's workspace. 
If the Yabby session has been started in /home/jake, and the
following command is executed:

\begin{verbatim}
yabby> seq_load tom20.fas tom20

 Reading the file 'tom20.fas' ..
 3 sequence(s) found.
\end{verbatim}

This has created the object 'tom20.seq'.  One can inspect this object
directly by using system tools:

\begin{verbatim}
$ cd /home/jake/.yabby
$ ls
tom20.seq
$ cat tom20.seq
<?xml version="1.0"?><seqroot><seqentry><seqid>A.thaliana2</
seqid><comment>A.thaliana2</comment><sequence>MEFSTADFERFIMF
EHARKNSEAQYKNDPLDSENLLKWGGALLELSQFQPIPEAKLMLNDAISKLEEALTINPG
KHQALWCIANAYTAHAFYVHDPEEAKEHFDKATEYFQRAENEDPGNDTYRKSLDSSLKAP
ELHMQFMNQGMGQQILGGGGGGGGGGMASSNVSQSSKKKKRNTEFTYDVCGWIILACGIV
AWVGMAKSLGPPPPAR</sequence></seqentry><seqentry><seqid>O.sat
iva</seqid><comment>O.sativa</comment><sequence>MDMGAMSDPERM
FFFDLACQNAKVTYEQNPHDADNLARWGGALLELSQMRNGPESLKCLEDAESKLEEALKI
DPMKADALWCLGNAQTSHGFFTSDTVKANEFFEKATQCFQKAVDVEPANDLYRKSLDLSS
KAPELHMEIHRQMASQASQAASSTSNTRQSRKKKKDSDFWYDVFGWVVLGVGMVVWVGLA
KSNAPPQAPR</sequence></seqentry><seqentry><seqid>L.esculentu
m</seqid><comment>L.esculentum</comment><sequence>MDMQSDFDRL
LFFEHARKTAETTYATDPLDAENLTRWAGALLELSQFQSVSESKKMISDAISKLEEALEV
NPQKHDAIWCLGNAYTSHGFLNPDEDEAKIFFDKAAQCFQQAVDADPENELYQKSFEVSS
KTSELHAQIHKQGPLQQAMGPGPSTTTSSTKGAKKKSSDLKYDVFGWVILAVGLVAWIGF
AKSNMPXPAHPLPR</sequence></seqentry></seqroot>
\end{verbatim}

The object 'tom20.seq' is a plain text XML document which captures
the data about sequences.

The ability to access the workspace allows one to inspect the
existing objects while they reside in Yabby's "memory", which
often comes handy. For example, this may facilitate the understanding
of an object's internal representation, and helps the implementation
(and debugging) of data models that need to be developed for new
data types. 

\subsection{The data formats}

Currently Yabby implements two different data formats: two-dimensional
list (table) and XML document. In many bioinformatics applications data
can be effectively represented by two dimensional tables. Examples
include the list of atoms in a molecule, the list of atom coordinates,
the list of sequence IDs and their rankings, etc. For such data, two
dimensional tables are the ideal representation. Ocassionally, the objects
which have more complex internal structure need to be handled. For
example, a list of protein sequences where each sequence may need to
be characterized with several attributes: sequence ID, organism name, 
the sequence string consisting of amino acid residues and so on.

As an example for a two-dimensional list data format, consider the
property 'mol' (molecule).

\begin{verbatim}
yabby> mol_load 1BT0.pdb rub

 661 atoms found in the molecule 'rub'

yabby> what

    object(s)      type
  ------------------------------
    rub            mol           
\end{verbatim}

The command 'mol\_load' has created the object 'rub.mol'. This object
is stored in the workspace as the file 'rub.mol':

\begin{verbatim}
$ cd .yabby
/home/jack/data/.yabby
$ ls
rub.mol
$ head rub.mol
1 MET N 
1 MET CA 
1 MET C 
1 MET O 
1 MET CB 
1 MET CG 
1 MET SD 
1 MET CE 
2 LEU N 
2 LEU CA 
\end{verbatim}

This is an example where the two-dimensional list fits perfectly
the data type: a molecule is a list of atoms, and each atom is
characterized by residue number, residue name, and atom name.
This information can be naturally arranged into a two-dimensional
table, with one atom per table row. 

On the other hand, the protein or DNA sequence is not naturally
amenable to the two-dimensional list representation. A sequence
may be characterized by several attributes (sequence ID, organism
name, database specific IDs, and so on) in addition to the sequence
itself, which may contain a few to many thousands of letters. The
matter is further complicated if the data model needs to handle
transparently one or more sequences. Yabby uses XML to represent
such data which requires more capable representations.

Consider the content of the file 'tom20.seq' which contains
the information about three tom20 sequences given previously.
If arranged in a more readable format, this sequence object
looks as given below:

\begin{verbatim}
<?xml version="1.0"?>
<seqroot>
  <seqentry>
    <seqid>A.thaliana2</seqid>
    <comment>A.thaliana2</comment>
    <sequence>MEFSTADFERFIMFEHARKNSEAQYKNDPLDSENLLKWGGALLELSQF
      QPIPEAKLMLNDAISKLEEALTINPGKHQALWCIANAYTAHAFYVHDPEEAKEHFD
      KATEYFQRAENEDPGNDTYRKSLDSSLKAPELHMQFMNQGMGQQILGGGGGGGGGG
      MASSNVSQSSKKKKRNTEFTYDVCGWIILACGIVAWVGMAKSLGPPPPAR
    </sequence>
  </seqentry>
  <seqentry>
    <seqid>O.sativa</seqid>
    <comment>O.sativa</comment>
    <sequence>MDMGAMSDPERMFFFDLACQNAKVTYEQNPHDADNLARWGGALLELSQM
      RNGPESLKCLEDAESKLEEALKIDPMKADALWCLGNAQTSHGFFTSDTVKANEFFEK
      ATQCFQKAVDVEPANDLYRKSLDLSSKAPELHMEIHRQMASQASQAASSTSNTRQSR
      KKKKDSDFWYDVFGWVVLGVGMVVWVGLAKSNAPPQAPR</sequence>
  </seqentry>
  <seqentry>
  <seqid>L.esculentum</seqid>
    <comment>L.esculentum</comment>
    <sequence>MDMQSDFDRLLFFEHARKTAETTYATDPLDAENLTRWAGALLELSQFQS
    VSESKKMISDAISKLEEALEVNPQKHDAIWCLGNAYTSHGFLNPDEDEAKIFFDKAAQC
    FQQAVDADPENELYQKSFEVSSKTSELHAQIHKQGPLQQAMGPGPSTTTSSTKGAKKKS
    SDLKYDVFGWVILAVGLVAWIGFAKSNMPXPAHPLPR</sequence>
  </seqentry>
</seqroot>
\end{verbatim}

It is apparent that each sequence is enclosed by the XML tags
$<$seqentry$>$ and $<$/seqentry$>$. Furthermore, the individual
attributes for each sequence are enclosed in tags such as
$<$seqid$>$ and $<$/seqid$>$, $<$comment$>$ and $<$/comment$>$
and so on. This allows a great flexibility in the representation
of data with complex internal structure. This flexibility comes
at a price, as it requires more effort to design and then implement
an XML based format compared to a two-dimensional list. 

\section{Installation}

\subsection{Checking Perl}

\index{installation}
\index{Perl}
\index{system requirements}

Yabby has been developed with Perl version 5.X, a freely available
general purpose scripting language. Perl stands for "Practical
Extraction and Report Language" and is particularly well suited for
the manipulation of data stored in plain text files. This seems
all too often to be required in bioinformatics applications.

Before attempting the installation, it is highly recommended to
check if the Perl interpreter is present on your computer system.
For example, on a Linux system,

\begin{verbatim}
$ perl -v

This is perl, v5.8.5 built for i386-linux-thread-multi

Copyright 1987-2004, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'.  If you have access to the
Internet, point your browser at http://www.perl.com/, the Perl Home Page.
\end{verbatim}

Perl 5 and later is required for Yabby.  The next step is to find where
exactly is the Perl interpreter located, as this information will be
required for Yabby installation:

\begin{verbatim}
$ which perl
/usr/local/bin/perl
\end{verbatim}

\subsection{Downloading Yabby}

Yabby source code can be browsed from the Google Code servers, at
the URL: http://code.google.com/p/yabby/. Under the
section "Source" one can find the instructions for downloading the
source code. The same page provides the link under "This project's
Subversion repository can be viewed in your web browser" which allows
one to browse the source code on the server without actually
downloading it.

Google servers maintain the source code by the program called 'subversion'
(an open-source version control system).  To download the source code
one needs to use the subversion client program called 'svn'.  The 'svn'
client exists for all mainstream operating systems\footnote{For example,
on Linux CentOS 4 the RPM package 'subversion-1.3.2-1.rhel4.i386.rpm'
provides the subversion client 'svn'.}, for more information see
http://subversion.tigris.org/.  The book about subversion is freely
available on-line at http://svnbook.red-bean.com/. Subversion has
extensive functionality however only the very basic functionality
is needed to download Yabby.  Assuming that the computer is connected
to the internet, the following command will download the latest Yabby 
source code in the current directory:

\begin{verbatim}
$ svn checkout http://yabby.googlecode.com/svn/trunk/ yabby
A    yabby/yabby.pl
A    yabby/LICENSE
A    yabby/lib
A    yabby/lib/blast.pl
A    yabby/lib/hmm_score2seq.pl
A    yabby/lib/seq_strip.pl
A    yabby/lib/seq_comment.pl
A    yabby/lib/hmm_score_extract.pl
A    yabby/lib/blastg.pl
A    yabby/lib/motif_cmp.pl
A    yabby/lib/seq_unique.pl
A    yabby/lib/yabby_seq.pm
....further output deleted....
\end{verbatim}

\subsection{Linux installation}

The installation of Yabby requires that the file 'yabby/lib/yabby.pl' is
modified in order to set the path to the Perl language interpreter,
and to the Yabby libraries.

If Yabby code was downloaded in the directory /home/jake/ (and
therefore the script 'yabby.pl' is in /home/jake/yabby/), to set
the path to Yabby libraries the following two lines need to be
set: 

\begin{verbatim}
use lib "/home/jake/yabby/lib";
$LIB_DIR = "/home/jake/yabby/lib";
\end{verbatim}

The path to the Perl interpreter is set in the first line of the
file 'yabby.pl':

\begin{verbatim}
#!/usr/bin/perl
\end{verbatim}

The script yabby.pl needs to have executable permissions:

\begin{verbatim}
$ chmod +x yabby.pl
\end{verbatim}

If this is all set, running the script 'yabby.pl' will start Yabby:

\begin{verbatim}
$ yabby.pl

 - YABBY version 0.1 - 

 [ 35 command(s) ready ]

yabby>
\end{verbatim}

It is often useful to create a symbolic link in a directory which
is included the PATH variable, such as /usr/local/bin or ~/bin:

\begin{verbatim}
ln -s /home/jake/yabby/yabby.pl /home/jake/bin/yabby
\end{verbatim}

This would allow Yabby to be run from any directory, simply by
typing 'yabby'. When a particular Yabby command relies on an 
external program or library (such as NCBI BLAST), additional
configuration may be needed to allow Yabby to find the appropriate
files.

\section{Windows Installation}

If Cygwin is installed, the UNIX instructions above should work. Both
perl and subversion can be obtained through Cygwin's setup.exe. However 
that Yabby was not tested on Cygwin, use at your own risk.

\subsection{Installing Perl}

The perl interpreter must be installed, we recommend to to install
ActiveState's ActivePerl.

\begin{verbatim}
http://www.activestate.com/Products/activeperl/
\end{verbatim}

At this site one can obtain a free copy of the ActivePerl installer, which
will then need to executed to install the interpreter.

\subsection{Installing a subversion client}

Google servers maintain the Yabby source code by the program called
'subversion' (an open-source version control system).  A powerful and
easy-to-use interface to subversion for Windows is TortoiseSVN. It is
an extension of the shell, so it can be (and must be) used straight
from Windows Explorer. It is available for free download here:

\begin{verbatim}
http://tortoisesvn.net/downloads
\end{verbatim}

Run the .msi executable, and it will install. This will also require
a reboot.

\subsection{Downloading Yabby}

The following instructions refer specifically to TortoiseSVN. It is
recommended, but not necessary, that a new directory be created anywhere in
the filesystem where the Yabby source code is to be located. For example:

C:{$\backslash$}
\begin{description}
  \item[1] create C:{$\backslash$}. Then open this directory.
  
  \item[2] Right-click the empty space and select SVN Checkout.
  
  \item[3] The URL is http://yabby.googlecode.com/svn/trunk/. Click OK.
\end{description}

\subsection{Yabby installation}

Yabby installation requires that the file 'yabby/lib/yabby.pl' is modified to
to set the path to Yabby libraries.

If Yabby code was downloaded in the root directory C:$\backslash$, to set
the path to Yabby libraries the following two lines need to be set (yabby.pl
is stored in binary format, not text. So use Wordpad.exe, found in
{\tt Start Menu$\longrightarrow$All Programs$\rightarrow$Accessories}):
\begin{verbatim}
use lib "C:\\Yabby\\lib";
$LIB_DIR = "C:\\Yabby\\lib";
\end{verbatim}
NOTE: `{$\backslash$}' is an escape character in Perl strings, which is
also the character Windows uses to seperate directories. A double escape
character escapes the escape. Hence the need for two.

If you double-click yabby.pl now, it will start Yabby.
\begin{verbatim}
 - YABBY version 0.1 - 

 [ 38 command(s) ready ]

yabby>
\end{verbatim}

